Grammar of graphics and ggplot2

MACS 40700 University of Chicago

Grammar

The whole system and structure of a language or of languages in general, usually taken as consisting of syntax and morphology (including inflections) and sometimes also phonology and semantics.

Grammar of graphics

  • “The fundamental principles or rules of an art or science”
  • A grammar used to describe and create a wide range of statistical graphics.
  • Layered grammar of graphics
    • ggplot2

Layered grammar of graphics

  • Layer
    • Data
    • Mapping
    • Statistical transformation (stat)
    • Geometric object (geom)
    • Position adjustment (position)
  • Scale
  • Coordinate system (coord)
  • Faceting (facet)
  • Defaults
    • Data
    • Mapping

Layer

  • Responsible for creating the objects that we perceive on the plot
  • Defined by its subcomponents

Data and mapping

  • Data defines the source of the information to be visualized
  • Mapping defines how the variables are applied to the graphic

Data: mpg

## Observations: 234
## Variables: 11
## $ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "...
## $ model        <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 qua...
## $ displ        <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0,...
## $ year         <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1...
## $ cyl          <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6...
## $ trans        <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)...
## $ drv          <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4",...
## $ cty          <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 1...
## $ hwy          <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 2...
## $ fl           <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p",...
## $ class        <chr> "compact", "compact", "compact", "compact", "comp...

Mapping: mpg

## # A tibble: 234 x 2
##        x     y
##    <dbl> <int>
##  1  1.80    29
##  2  1.80    29
##  3  2.00    31
##  4  2.00    30
##  5  2.80    26
##  6  2.80    26
##  7  3.10    27
##  8  1.80    26
##  9  1.80    25
## 10  2.00    28
## # ... with 224 more rows

Data and mapping

ggplot(data = mpg, mapping = aes(x = displ, y = hwy))
ggplot() +
  geom_(data = mpg, mapping = aes(x = displ, y = hwy))

Statistical transformation

  • Transforms the data (typically by summarizing the information)

Raw data

## # A tibble: 234 x 1
##      cyl
##    <int>
##  1     4
##  2     4
##  3     4
##  4     4
##  5     6
##  6     6
##  7     6
##  8     4
##  9     4
## 10     4
## # ... with 224 more rows

Transformed data

## # A tibble: 4 x 2
##     cyl     n
##   <int> <int>
## 1     4    81
## 2     5     4
## 3     6    79
## 4     8    70

Stat transform syntax

stat_output <- stat_count(data = mpg, mapping = aes(x = cyl))
class(stat_output)
## [1] "LayerInstance" "Layer"         "ggproto"
stat_output
## mapping: x = cyl 
## geom_bar: na.rm = FALSE, width = NULL
## stat_count: na.rm = FALSE, width = NULL
## position_stack

Geometric objects (geoms)

  • Control the type of plot you create
    • 0 dimensions - point, text
    • 1 dimension - path, line
    • 2 dimensions - polygon, interval
  • Geoms have specific aesthetics

Position adjustment

Position adjustment

Position adjustment

Position adjustment

Position adjustment

Position adjustment

geom_bar(position = "fill")
geom_bar(position = position_fill())

Scale

  • Controls how data is mapped to aesthetic attributes
  • One scale for every aesthetic property employed in a layer

Scale: color

Scale: color

Scale syntax

scale_()

Coordinate system (coord)

  • Maps the position of objects onto the plane of the plot

Cartesian coordinate system

Semi-log

Polar

Maps

Faceting

Faceting

  • facet_grid()
  • facet_wrap()

Faceting

Faceting

Themes

  • Control the display of all non-data elements of the plot
  • theme()
  • theme_()

Default themes

Defaults

ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point", stat = "identity", position = "identity"
  ) +
  scale_x_continuous() +
  scale_y_continuous() +
  coord_cartesian()

Defaults

ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point", stat = "identity", position = "identity"
  ) +
  scale_x_continuous() +
  scale_y_continuous() +
  coord_cartesian()
ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point"
  )

Defaults

ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point", stat = "identity", position = "identity"
  ) +
  scale_x_continuous() +
  scale_y_continuous() +
  coord_cartesian()
ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point"
  )
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
  geom_point()

Defaults

ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point", stat = "identity", position = "identity"
  ) +
  scale_x_continuous() +
  scale_y_continuous() +
  coord_cartesian()
ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point"
  )
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
  geom_point()
ggplot(mpg, aes(displ, hwy)) +
  geom_point()

Defaults

ggplot(mpg, aes(displ, hwy)) +
  geom_point()

Defaults

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_smooth()

Defaults

ggplot(mpg) +
  geom_point(aes(displ, hwy)) +
  geom_smooth()
## Error: stat_smooth requires the following missing aesthetics: x, y

Carte figurative des pertes successives en hommes de l’Armee Français dans la campagne de Russe 1812–1813 by Charles Joseph Minard

Building Minard’s map in R

glimpse(troops)
## Observations: 51
## Variables: 5
## $ long      <dbl> 24.0, 24.5, 25.5, 26.0, 27.0, 28.0, 28.5, 29.0, 30.0...
## $ lat       <dbl> 54.9, 55.0, 54.5, 54.7, 54.8, 54.9, 55.0, 55.1, 55.2...
## $ survivors <int> 340000, 340000, 340000, 320000, 300000, 280000, 2400...
## $ direction <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A...
## $ group     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
glimpse(cities)
## Observations: 20
## Variables: 3
## $ long <dbl> 24.0, 25.3, 26.4, 26.8, 27.7, 27.6, 28.5, 28.7, 29.2, 30....
## $ lat  <dbl> 55.0, 54.7, 54.4, 54.3, 55.2, 53.9, 54.3, 55.5, 54.4, 55....
## $ city <chr> "Kowno", "Wilna", "Smorgoni", "Moiodexno", "Gloubokoe", "...
glimpse(temps)
## Observations: 9
## Variables: 5
## $ long  <dbl> 37.6, 36.0, 33.2, 32.0, 29.2, 28.5, 27.2, 26.7, 25.3
## $ temp  <int> 0, 0, -9, -21, -11, -20, -24, -30, -26
## $ month <chr> "Oct", "Oct", "Nov", "Nov", "Nov", "Nov", "Dec", "Dec", ...
## $ day   <int> 18, 24, 9, 14, 24, 28, 1, 6, 7
## $ date  <date> 1812-10-18, 1812-10-24, 1812-11-09, 1812-11-14, 1812-11...

Minard’s grammar

  • Troops
    • Latitude
    • Longitude
    • Survivors
    • Advance/retreat
  • Cities
    • Latitude
    • Longitude
    • City name

Create the troop movement layer

ggplot(data = troops,
       mapping = aes(x = long, y = lat, group = group)) +
  geom_path()

Add aesthetics

ggplot(data = troops,
       mapping = aes(x = long, y = lat, group = group,
                     color = direction, size = survivors)) +
  geom_path()

Tweak the path appearance

ggplot(data = troops,
       mapping = aes(x = long, y = lat, group = group,
                     color = direction, size = survivors)) +
  geom_path(lineend = "round")

Adjust the size scale

ggplot(data = troops,
       mapping = aes(x = long, y = lat, group = group,
                     color = direction, size = survivors)) +
  geom_path(lineend = "round") +
  scale_size(range = c(0.5, 15))

Remove extraneous junk

ggplot(data = troops,
       mapping = aes(x = long, y = lat, group = group,
                     color = direction, size = survivors)) +
  geom_path(lineend = "round") +
  scale_size(range = c(0.5, 15)) + 
  scale_color_manual(values = c("#DFC17E", "#252523")) +
  labs(x = NULL,
       y = NULL) + 
  guides(color = FALSE,
         size = FALSE)

Create the cities layer

ggplot() +
  geom_path(data = troops,
            mapping = aes(x = long, y = lat, group = group,
                          color = direction, size = survivors),
            lineend = "round") +
  geom_point(data = cities, aes(x = long, y = lat)) +
  geom_text(data = cities, aes(x = long, y = lat, label = city)) +
  scale_size(range = c(0.5, 15)) + 
  scale_color_manual(values = c("#DFC17E", "#252523")) +
  labs(x = NULL,
       y = NULL) + 
  guides(color = FALSE,
         size = FALSE)

Adjust city name locations

ggplot() +
  geom_path(data = troops,
            mapping = aes(x = long, y = lat, group = group,
                          color = direction, size = survivors),
            lineend = "round") +
  geom_point(data = cities, aes(x = long, y = lat)) +
  geom_text(data = cities, aes(x = long, y = lat, label = city),
            vjust = 1.5) +
  scale_size(range = c(0.5, 15)) + 
  scale_color_manual(values = c("#DFC17E", "#252523")) +
  labs(x = NULL,
       y = NULL) + 
  guides(color = FALSE,
         size = FALSE)

Improve color and font

ggplot() +
  geom_path(data = troops,
            mapping = aes(x = long, y = lat, group = group,
                          color = direction, size = survivors),
            lineend = "round") +
  geom_point(data = cities, aes(x = long, y = lat),
             color = "#DC5B44") +
  geom_text(data = cities, aes(x = long, y = lat, label = city),
            vjust = 1.5,
            color = "#DC5B44", family = "sans") +
  scale_size(range = c(0.5, 15)) + 
  scale_color_manual(values = c("#DFC17E", "#252523")) +
  labs(x = NULL,
       y = NULL) + 
  guides(color = FALSE,
         size = FALSE)

Remove all background noise

troops_cities <- ggplot() +
  geom_path(data = troops,
            mapping = aes(x = long, y = lat, group = group,
                          color = direction, size = survivors),
            lineend = "round") +
  geom_point(data = cities, aes(x = long, y = lat),
             color = "#DC5B44") +
  geom_text(data = cities, aes(x = long, y = lat, label = city),
            vjust = 1.5,
            color = "#DC5B44", family = "sans") +
  scale_size(range = c(0.5, 15)) + 
  scale_color_manual(values = c("#DFC17E", "#252523")) +
  labs(x = NULL,
       y = NULL) + 
  guides(color = FALSE,
         size = FALSE) +
  theme_void()
troops_cities

Temperatures and time

ggplot(data = temps, aes(x = long, y = temp)) +
  geom_line() +
  geom_label(aes(label = temp), vjust = 1.5)

Make a better label

temps <- temps %>%
  mutate(nice_label = str_c(temp, "°, ", month, ". ", day))

ggplot(data = temps, aes(x = long, y = temp)) +
  geom_line() +
  geom_label(aes(label = nice_label), vjust = 1.5)

Clean up the graph

ggplot(data = temps, aes(x = long, y = temp)) +
  geom_line() +
  geom_label(aes(label = nice_label),
            family = "sans", size = 2.5) + 
  labs(x = NULL,
       y = "° Celsius") +
  scale_y_continuous(position = "right") +
  coord_cartesian(ylim = c(-35, 5)) +  # Add some space above/below
  theme_bw(base_family = "sans") +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text.x = element_blank(), axis.ticks = element_blank(),
        panel.border = element_blank())

Align the axes

xrange <- ggplot_build(troops_cities)$layout$panel_ranges[[1]]$x.range
xrange
## [1] 23.3 38.4
temps_plot <- ggplot(data = temps, aes(x = long, y = temp)) +
  geom_line() +
  geom_label(aes(label = nice_label),
            family = "sans", size = 2.5) + 
  labs(x = NULL,
       y = "° Celsius") +
  scale_x_continuous(limits = xrange) +
  scale_y_continuous(position = "right") +
  coord_cartesian(ylim = c(-35, 5)) +  # Add some space above/below
  theme_bw(base_family = "sans") +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text.x = element_blank(), axis.ticks = element_blank(),
        panel.border = element_blank())
temps_plot

grid.arrange

example.data <- data_frame(
  x = 1:10,
  y = rnorm(10)
)

plot1 <- ggplot(example.data, aes(x = x, y = y)) +
  geom_line() +
  labs(y = "This is a really\nreally really really\nreally tall label")

plot2 <- ggplot(example.data, aes(x = x, y = y)) +
  geom_line() +
  labs(y = NULL)

grid.arrange(plot1, plot2)

gtable_rbind

plot.both <- gtable_rbind(ggplotGrob(plot1),
                          ggplotGrob(plot2))

grid::grid.newpage()
grid::grid.draw(plot.both)

Combine map and temperature

both_plot <- gtable_rbind(ggplotGrob(troops_cities),
                          ggplotGrob(temps_plot))

grid::grid.newpage()
grid::grid.draw(both_plot)

Adjust relative panel heights

# Identify which layout elements are panels
panels <- both_plot$layout$t[grep("panel", both_plot$layout$name)]
panels
## [1]  6 16
# Let's try a 3:1 ratio
both_plot$heights[panels] <- unit(c(3, 1), "null")

grid::grid.newpage()
grid::grid.draw(both_plot)